Goto

Collaborating Authors

 conformity score


Debiased Machine Learning for Conformal Prediction of Counterfactual Outcomes Under Runtime Confounding

Barnatchez, Keith, Josey, Kevin P., Nethery, Rachel C., Parmigiani, Giovanni

arXiv.org Machine Learning

Data-driven decision making frequently relies on predicting counterfactual outcomes. In practice, researchers commonly train counterfactual prediction models on a source dataset to inform decisions on a possibly separate target population. Conformal prediction has arisen as a popular method for producing assumption-lean prediction intervals for counterfactual outcomes that would arise under different treatment decisions in the target population of interest. However, existing methods require that every confounding factor of the treatment-outcome relationship used for training on the source data is additionally measured in the target population, risking miscoverage if important confounders are unmeasured in the target population. In this paper, we introduce a computationally efficient debiased machine learning framework that allows for valid prediction intervals when only a subset of confounders is measured in the target population, a common challenge referred to as runtime confounding. Grounded in semiparametric efficiency theory, we show the resulting prediction intervals achieve desired coverage rates with faster convergence compared to standard methods. Through numerous synthetic and semi-synthetic experiments, we demonstrate the utility of our proposed method.







31b3b31a1c2f8a370206f111127c0dbd-Paper.pdf

Neural Information Processing Systems

This frameworkcanaccommodate almost anychoice of conformity scores, and in fact many different implementations have already been proposed to address ourproblem. However,itremains unclear howtoimplement aconcrete method fromthis broad family that can lead to the most informative possible prediction intervals.


endfor

Neural Information Processing Systems

The first method, explained in Section A1.4.1, consists of directly calibrating a sequence of nested two-sided intervals, as outlined in Section 3.3. The second method, explained in Section A1.4.2, consists of separately calibrating two sequences of lower and upper one-sided confidence intervals, each adopting the significance level α/2 instead of α. Pu j=l ˆϕj(x)amongthefeasible ones with minimal |u l|, whenever the optimization problem does not have a unique solution. Therefore, we can assume without loss of generality that (1) has a unique solution; if that is not the case, we can break the ties at random by adding a little noise to ˆϕ. For any integer T 1, consider an increasing sequence tτ [0,1], for τ {0,...,T}. A nested sequenceofT intervalsindexedbyτ {0,...,T},whichmaybewrittenintheformof St = ˆLm,α(Xm+1;tτ), ˆUm,α(Xm+1;tτ), for appropriate lower and upper endpoints ˆLm,α(Xm+1;tτ) and ˆUm,α(Xm+1;tτ), respectively, is then constructed from (1) as follows.


2b2011a7d5396faf5899863d896a3c24-Paper-Conference.pdf

Neural Information Processing Systems

A flexible conformal inference method is developed to construct confidence intervals for the frequencies of queried objects in very large data sets, based on a much smaller sketch of those data.